Sketch2PoseNet: Efficient and Generalized Sketch to 3D Human Pose Prediction

We address the challenge of speed and generalization in sketch-based 3D pose estimation by employing a learn-from-synthesis strategy. Through training on our established synthetic sketch-pose dataset, we present Sketch2PoseNet—a feed-forward network tailored to sketches—that efficiently and accurately predicts generalized 3D human poses across various sketch styles.

SIGGRAPH Asia 2025 Project Page Code

TeRA : Rethinking Text-Guided Realistic 3D Avatar Generation

We propose TeRA, the first latent diffusion model specifically designed for text-guided 3D avatar generation. TeRA achieves superior inference speed, text-to-3D alignment, and visual quality, while naturally supporting text-guided structure-aware editing.
ICCV 2025 Project Page Code

IDOL: Instant Photorealistic 3D Human Creation from a Single Image

We introduce a large-scale HUman-centric GEnerated dataset, HuGe100K. Leveraging the diversity in views, poses, and appearances within HuGe100K, we propose a scalable feed-forward transformer model to predict a 3D human Gaussian representation in a uniform space from a given human image.
CVPR 2025 Project Page Code

FATE: Full-head Gaussian Avatar with Textural Editing from Monocular Video

We introduce FATE — a novel method for reconstructing an editable full-head avatar from a single monocular video. FATE outperforms previous approaches in both qualitative and quantitative evaluations, achieving state-of-the-art performance. To the best of our knowledge, FATE is the first animatable and 360° full-head monocular reconstruction method for a 3D head avatar.
CVPR 2025 Project Page Code

Hallo2: Long-Duration and High-Resolution Audio-Driven Portrait Image Animation

We present advancements in portrait image animation through the enhanced capabilities of the Hallo framework. By extending animation durations to tens of minutes while maintaining highresolution 4K output, our approach addresses significant limitations of existing methods.
ICLR 2025 Project Page Code

VividTalk: One-Shot Audio-Driven Talking Head Generation Based on 3D Hybrid Prior

We propose VividTalk, a two-stage generic framework that supports generating high-visual quality talking head videos with all the above properties. Extensive experiments show that the proposed VividTalk can generate high-visual quality talking head videos with lip-sync and realistic enhanced by a large margin.
3DV 2025 Project Page

Towards Native Generative Model for 3D Head Avatar

We explore learning a native generative model for 360° full head from limited 3D head data. Three key problems are studied: 1) utilizing various representations for 360°-renderable head generation; 2) disentangling face appearance, shape, and motion for editable and motion-driven 3D head models; 3) enhancing model generalization for downstream tasks.
arXiv 2024

Head360: Learning a Parametric 3D Full-Head for Free-View Synthesis in 360°

We propose a novel parametric 360° renderable head model built from artist-designed high-fidelity 3D head models, disentangling facial motion/shape and appearance. The model is the first parametric 3D full-head that achieves 360° free-view synthesis, image-based fitting, appearance editing, and animation within a single model.
ECCV 2024 Project Page Code

EmoTalk3D: High-Fidelity Free-View Synthesis of Emotional 3D Talking Head

We present a novel approach for synthesizing 3D talking heads with controllable emotion, enhancing lip synchronization and rendering quality. To address multi-view consistency and emotional expressiveness issues, we propose a ‘Speech-to-Geometry-to-Appearance’ mapping framework trained on the EmoTalk3D dataset, enabling controllable emotion, wide-range view rendering, and fine facial details.
ECCV 2024 Project Page

STAG4D: Spatial-Temporal Anchored Generative 4D Gaussians

We propose STAG4D, a novel framework for high-quality 4D generation, integrating pre-trained diffusion models with dynamic 3D Gaussian splatting. Our method outperforms prior 4D generation works in rendering quality, spatial-temporal consistency, and generation robustness, setting a new state-of-the-art for 4D generation from diverse inputs, including text, image, and video.
ECCV 2024 Project Page Code

Hao Zhu